Antipattern: Use FLOAT Data Type
Let’s learn how using the FLOAT data type for non-integer data affects the output.
We'll cover the following
Most programming languages support a data type for real numbers, called float
or double
. SQL supports a similar data type of the same name. Many programmers naturally use the SQL FLOAT
data type everywhere they need fractional numeric data because they are accustomed to programming with the float
data type.
The FLOAT
data type in SQL, like float
in most programming languages, encodes a real number in a binary format according to the IEEE 754 standard. We need to understand some characteristics of floating-point numbers in this format to use them effectively.
Rounding by necessity#
Many programmers are not aware of a characteristic of this floating-point format: not all values that can be described in decimals can be stored in binary. Out of necessity, some numbers must be rounded to a very close value.
To give some context for this rounding behavior, let’s look at rational numbers that have repeating decimal numbers, such as one-third, written as “0.333…”. The true value cannot be represented in decimal because we must write an infinite number of digits. The number of digits is the precision of the number, so that a repeating decimal number would require infinite precision.
The compromise is to use finite precision, which is choosing a numeric value as close as possible to the original value, for example, “0.333”. However, this means that the value isn’t the same number that we intended.
Even if we increase the precision, we still can’t add three of these approximations of one-third to get a true value of 1.0. This is the necessary compromise of using finite precision to represent numbers that may have repeating decimals.
This means that many legitimate numbers that we can imagine cannot be represented with finite precision. We may think it to be acceptable because we can’t really type a number with infinite digits anyway, so, naturally, we should make do with typing any number with finite precision and storing it precisely — right? Unfortunately not.
IEEE 754 represents floating-point numbers in a base-2 format. The values that require infinite precision in binary are different values from those that behave this way in decimal. Some values that only need finite precision in decimal, for instance, 59.95, require infinite precision to be represented exactly in binary. The FLOAT
data type can’t do this, so it uses the closest value in base-2 it can store, which is equal to 59.950000762939 in base-10.
Some values coincidentally use finite precision in both formats. In theory, if we understand the details of storing numbers in the IEEE 754 format, we can predict how a given decimal value is represented in binary. But in practice, most people won’t do this computation for every floating-point value they use. We can’t guarantee that a FLOAT
column in the database will be given only values that are cooperative, so our application should assume that any value in this column may have been rounded.
Meet the IEEE 754 format
Some databases support related data types called DOUBLE PRECISION
and REAL
. The precision that these data types and FLOAT
support varies by database implementation, but they all represent floating-point values with a finite number of binary digits, so they all have similar rounding behavior.
Using FLOAT
in SQL#
Some databases can compensate for the inexactness and display the intended value.
But the actual value stored in the FLOAT
column may not be exactly this value. If we magnify the value by a billion, we see the discrepancy:
We may expect the magnified value returned by the previous query to be 59950000000.00. This shows that the value 59.95 has been rounded to a value that can be represented in the finite precision offered by the IEEE 754 binary format. In this case, the value is within one ten-millionth, which is close enough for many calculations.
However, it’s not close enough for some other kinds of calculations to be accurate. One example is using a FLOAT
in an equality comparison.
The query has been executed successfully, but there are no matched rows.
We saw before that the value stored in hourly_rate
is actually slightly more than 59.95. Thus, even though we assigned the value 59.95 to this column for account_id
123, now the row fails to match the previous query.
One common workaround for this issue is to treat floating-point values as “effectively equal” if they are close within a small threshold. To do this, we can subtract one value from the other, and use SQL’s absolute value function ABS()
to strip the sign from the difference. If the result is zero, then the two values were exactly equal. If the result is small enough, then the two values can be treated as effectively equal. The following query succeeds in finding the row:
However, the difference is still large enough that a comparison of finer precision fails:
The appropriate threshold varies because the absolute difference between the base-10 value and the rounded base-2 value varies.
Another example of the inexact nature of FLOAT
causing accuracy problems is when we calculate aggregates of many values. For example, if we use SUM()
to add up the floating-point values in a column, the sum accumulates the discrepancy caused by rounding all the values.
The cumulative impact of inexact floating-point numbers is even more severe when we are calculating the aggregate product of a set of numbers instead of the sum. The difference seems small, but it compounds. For example, if we multiply the value 1 by a factor of exactly 1.0, the result is always 1. It doesn’t matter how many times we apply this factor. However, if the factor is actually 0.999, this has a different result. If we multiply a value of one by 0.999 a thousand times in succession, we get a result of about 0.3677. The more times we multiply, the more the discrepancy grows.
A good example of applying multiplication many times in succession is the calculation of compound interest in a financial application. Using inexact floating-point numbers introduces an error that seems tiny but grows as it compounds on itself. So, using exact values in financial applications is important.